Author: Wei Li, weililab.org

Note This R markdown file provides a simple and convenient web report for the MAGeCK results. For experienced users, MAGeCKFlute R package is recommended as it provides additional visualization functionalities.

Parameters

comparison_name is the prefix of your output file, defined by the “-n” parameter in your “mageck test” command. The system will look for the following files to generate this report:

# define the comparison_name here; for example,
# comparison_name='demo'
comparison_name='D34_D4_w_controls'

FDR cutoff is used to draw a boundary line in RRA or p value plots. Set it to -1 to disable the boundary line.

fdrcutoff=0.05
# fdrcutoff=-1 # disable FDR cutoff line

Preprocessing

Reading input files. If any of these files are problematic, an error message will be shown below.

gstable=read.table(gene_summary_file,header = T,as.is = T,na.strings='')
sg_table=read.table(sgrna_summary_file,header = T,as.is = T,na.strings='')
comp_samples=getcomparisonsfromlogfile(log_file)
collabel=c(comp_samples[[1]],comp_samples[[2]])

Summary

The samples used in the comparison is indicated as follows.

Sample summary
Sample Type
D4-REP1 control
D4-REP2 control
D4-REP3 control
D34-REP1 treatment
D34-REP2 treatment
D34-REP3 treatment

The statistics of comparisons is as indicated in the following table.

Comparion summary
Comparison Genes Selection FDR1% FDR5% FDR25%
D34_D4_w_controls 1210 neg 0 0 0
D34_D4_w_controls 1210 pos 194 233 233

The meanings of the columns are as follows.

Plotting invidivual genes in negative selection

The following figures show:

The following genes are used in the plot (change it as you like). By default, it is the top 5 genes in negatively selected genes.

targetgenelist_neg=gstable[gstable[,6]<=5,1]

# or, directly specify the genes to be plotted
#targetgenelist_neg=c("ACTR8","ACIN1")

# display genes used in the plot
print(targetgenelist_neg)
## [1] "V173E"   "V172D"   "R174R"   "V172A"   "V173Ter"

The following figure plots the distribution of RRA scores across these genes. Dotted lines represent the FDR cutoff line defined by the “fdrcutoff” value in the “Paramters” section.

The following figure plots the distribution of p values in these genes. Dotted lines represent the FDR cutoff line defined by the “fdrcutoff” value in the “Paramters” section.

Plotting invidivual genes in positive selection

The following genes are used in the plot (change it as you like). By default, it is the top 5 genes in negatively selected genes.

targetgenelist_pos=gstable[gstable[,12]<5,1]

# or, directly specify the genes to be plotted
#targetgenelist_pos=c("ACTR8","ACIN1")

# display genes used in the plot
print(targetgenelist_pos)
## [1] "V197L" "D393N" "R196Q" "L22L"

The following figure plots the distribution of RRA scores across these genes. Dotted lines represent the FDR cutoff line defined by the “fdrcutoff” value in the “Paramters” section.

The following figure plots the distribution of p values in these genes. Dotted lines represent the FDR cutoff line defined by the “fdrcutoff” value in the “Paramters” section.

sgRNA changes

The following figures show the distribution of sgRNA read counts (normalized) of selected genes in selected samples.

for(target_gene in c(targetgenelist_neg,targetgenelist_pos)){
  plotindvidualsgrnas(sg_table,target_gene,collabel)
}

The following figures show the distribution of sgRNA log2 fold changes of selected genes in selected samples.

for(target_gene in c(targetgenelist_neg,targetgenelist_pos)){
  plotindvidualsgrnas_lfc(sg_table,target_gene,collabel)
}

Enrichment analysis

The following table and figure uses Gene Set Enrichment Analysis (GSEA) to estimate the dropout of core essential genes (defined by the Johannes Zuber lab and also used in our MAGeCKFlute R package).

Note: fgsea R package is required.

library(fgsea)


#core_ess_gene=read.table('~/Dropbox/work/TengFei/design/mageck-flute/Core_Essentialome_Zuberlab_v2.txt',sep='\t',header = T,quote='',comment.char = '')

gene_id=gstable$id
if(sum(gene_id%in%core_ess_gene_symbol)<6){
  print('Not enough gene symbols found in essential gene list. Skip the enrichment step.')
}else{

  ranks=log10(gstable$neg.score)

  ranks=nrow(gstable):1/nrow(gstable)

  ranks=ranks*2-1

  names(ranks)=gstable$id


  gset=gstable$id[gene_id%in%core_ess_gene_symbol]
  gsetlist=list(zuber=gset)

  #fgseaRes <- fgsea(gsetlist, ranks, minSize=15, maxSize = 500, nperm=1000)

  fgseaRes <- fgseaMultilevel(gsetlist, ranks, minSize=15, maxSize = 500,scoreType = 'std',eps = 0)
  fgseaRes=fgseaRes[order(fgseaRes$pval),]


  #print(head(fgseaRes))
  print(kable(fgseaRes,caption='Enrichment results'))

  plotEnrichment(gsetlist$zuber, ranks)

}
## [1] "Not enough gene symbols found in essential gene list. Skip the enrichment step."